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Abstract. We study the attribution problem, that is, the problem of attributing a change in the value of 
a characteristic function / to its independent variables. We make three contributions. First, we propose 
a formalization of the problem based on a standard cost sharing model. Second, we show that there is 
a unique attribution method that satisfies Dummy, Additivity, Conditional Nonnegativity, Aff ine Scale 
Invariance, and Anonymity for all characteristic functions that are the sum of a multilinear function and 
an additive function. We term this the Aumann-Shapley-Shubik method. Conversely, we show that such a 
uniqueness result does not hold for characteristic functions outside this class. Third, we study multilinear 
characteristic functions in detail; we describe a computationally efficient implementation of the Aumann- 
Shapley-Shubik method and discuss practical applications to pay-per-click advertising and portfolio analysis. 



1. Introduction 

1.1. The Attribution Problem. Consider a function /(ri, . . . ,r„) of several variables ri, . . . ,r„. Given 
a change in the values of these variables, we ask what portion of the overall change is due to the change in 
each variable r^. In particular, we would like to divide the responsibility for the overall change among the 
variables in an axiomatic way. We term such problems attribution problems and the responsibilities assigned 
attributions. The attribution to the i*^ variable can be more interesting than simply the change Si — ri in 
the variable because the relationship between the magnitude of the change in a variable and the impact it 
has on / depends on the form of /. For instance, a tiny change in a variable could have a huge impact on 
the the value of the function. 

Formally, we are given a real-valued characteristic function / : M" — > M of n variables and initial and 
final values and Si for the independent variables. Here, the function / is deterministic, not learned from 
data, and the values of r and s are known exactly and are not estimates in any sense. Our objective is to 
find attributions zi(r, s, /),..., z„(r, s, /), where we interpret Zi{r,s,f) as the portion of the change in / 
due to the change in the i**^ variable, so that zi{r, s, /) + ••■ + Zn{r, s, /) = /(s) — /(r), which we call the 
completeness condition on the attribution. We interpret completeness as meaning that all the change in / 
is accounted for. (We often omit the characteristic function and simply write Zi{r, s) for Zi{r, s, /).) 

As we discuss attribution, we will keep the following motivating example in mind. See Section |2] for other 
examples, and a broader discussion about the applicability of our techniques. 

Example 1.1. Consider a firm that repeatedly procures a good from a foreign supplier for use in its 
manufacturing process. It incurs some expenditure, the product e = a-p - c oi the amount a of the good that 
the buyer purchases, the average cost per unit p of the good in the foreign currency, and the conversion rate 
c between the foreign and local currencies. We take e{a,p,c) as the characteristic function. The final values 
of e, a, p, and c may be statistics from a certain quarter, and the initial values may be statistics from the 
preceding quarter. The attribution problem, then, is to divide responsibility for the change in e among the 
changes in a, p, and c. 

Suppose further that the demand for the good comes from the manufacturing department (so an im- 
provement in the efficiency of manufacturing reduces a), that the price for the good is negotiated by the 
procurement department (so an improvement in the negotiation process decreases p) , and that the exchange 
rate is exogenously determined. Such an attribution could then serve to apportion blame between or deter- 
mine bonuses for the two departments. 

How can we attribute the change in the characteristic function / to the various variables? If / were 
linear, that is, if / takes the form /(ri, . . . ,r„) — biri, then for a change from r to s, it is natural to 
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attribute bi ■ {si — ri) to the i**^ variable. For non-linear functions such as the one in the Example if the 
independent variables all change slightly, we could replace bi by the partial derivative with respect to at 
s, giving a linear approximation of / locally at the final value and performing attribution as above. 

However, if the changes in the variables are not slight, then this approach would badly violate complete- 
ness. For instance, in Example suppose a changes from 4 to 5, p changes from 1 to 12, and c changes from 
1 to 1.5. Using this approach, the attributions to a, p, and c are (5 — 4) • 12 • 1.5 = 18, 5 • (12 — 1) • 1.5 = 82.5, 
and 5 • 12 • (1.5 — 1) = 30, respectively. This assigns the two departments and the exogenous currency rate 
change blame for a total of 18-f82. 5-1-30 = 130.5 of change; but the total change is only 5-12-1.5 — 4-1-1 = 86. 
This means that the attribution violates completeness, making it difficult to interpret practically. 

It might seem in this example that the failure of completeness originated from a poor choice of point 
approximation for the partial derivative of /. In general, no systematic use of such a point approximation 



suffices for our application. However, in Subsection 1.3.1 we will examine a principled method of computing 
attributions along these lines. 

1.2. Axioms for attribution methods. Our attribution problem (almost trivially) generalizes the cost 
or surplus sharing problem from the social choice literature (cf. Moulin I16|), where the problem is to 
axiomatically share the cost of production or surplus among several agentsj^ The characteristic function is 
either cost or surplus, the independent variables correspond to demands or contributions of agents, and the 
attributions correspond to cost shares or profit shares. The completeness condition is the budget balance 
condition for cost sharing. We give a more detailed discussion of the relationship between these two problems 
in Subsection 11.51 

Following the cost sharing literature, we take an axiomatic approach to choosing methods to use for 
attribution problems. In this section we discuss the axioms we consider and briefly discuss motivations for 
them, emphasizing the attribution context; see the cited papers for a longer discussion in the cost sharing 
context. 

• Dummy: If the value of the characteristic function does not depend on a variable, then the attribution 
to that variable is zero. 

This axiom is very natural, as it simply requires that variables irrelevant to the outcome be ignored]^ 

• Dummy ' : If the value of the characteristic function / does not depend on a variable on [r, s] , then 
the attribution Zi{r, s, f) to that variable is zero. 

This axiom is a natural strengthening of Dummy. It may be viewed as a local version of the global 
axiom Dummy. 

• Additivity: For aU r, s, /i, /2, we have that Zj(r, s, /i + /2) = Zi{r, s, /i) -I- Zi{r, s, /2). 

This axiom yields a type of procedural invariance. That is, if the system modeled by the characteristic 
function can be decomposed into several independent sub-processes that interact additively, we can 
compute the attributions separately for each sub-process. Alternatively, Additivity can be justified 
via lex parsimonae. Constructing attributions is equivalent to linearizing the effect of changes in the 
independent variables. When an attribution method satisfies Additivity, it is minimal in the sense 
that it preserves the pre-existing linear structure of the characteristic function. 

• Anonymity: The attributions are unchanged by relabeling of the variables. More formally, for any 
permutation a £ Sn, if faifi, . . • , Tn) — /(fcr-i(i), . . . , '''<T-i(n))j then for all i, we have 

^<T~i(i)(fCT(i), • • ■ ,r^(^n),s^i^i), . . . ,Sct(„),/ct) = Zi{r,s,f). 

Anonymity conveys the idea that all variables in the characteristic function should be treated equally, 
up to their initial and final values. 

• Conditional Nonnegativity: Suppose the characteristic function / is non-decreasing in a variable 
i on [r, s]. Then for all r, s, if Si > (resp. Si < ri), then Zi(r, s, /) > (resp. Zi(r, s, f) < 0). 

• Monotonicity [9]: Suppose the characteristic function / is non-decreasing in variable j. Then, for 
input pairs (r, s) and (r, s') such that s,; = s'j for i ^ j and Sj < s^, we have Zj(r, s, /) < Zj{r, s', /). 
Monotonicity, and Conditional Nonnegativity preclude attributions with counterintuitive signs. 



-'^This is also sometimes called the fair division problem. 

^In the cost sharing context, Dummy is the bedrock of the no cross-subsidy (full-responsibility) theory. In such a theory, the 
variable is deemed responsible for asymmetries in the cost as well as asymmetries in the demand; see Moulin and Sprumont I17| 
for a discussion. 
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Scale Invaricuice [5]: The attributions are independent of linear rescaling of individual variables. 
That is, for any c > 0, if g{ri, . . . , r„) — f{ri, . . . , r-j/c, . . . , r„), then for all i we have 



z, 



{r,sj) = z,(^{ri, . . . ,crj, . . . ,r„), (si, . . . ,csj, . . .,Sn),gy 



Scale Invariance conveys the idea that the attributions should be independent of the (possibly 
incomparable) units in which individual variables are measured. It is especially compelling in the 
context of attribution because the different variables may refer to quantities of entirely different 
things. 

• Aff ine Scale Invariance ,25j: The attributions are invariant under simultaneous affine transfor- 
mation of the characteristic function and the variables. That is, for any c,d > 0, if g{ri, . . . ,r„) = 
/(ri, . . . , {rj — d)/c, . . . , ?'„), then for all i we have 

Zi{r,s,f) = Zj(^(ri, . . . ,crj + d, . . . , r„), (si, . . . , cSj + d, . . . , s„), 

Aff ine Scale Invariance conveys the idea that both the units and the zero point of individual 
variables should not affect the value of the attribution. Again, for attribution this is especially 
compelling, since the variables may represent values without naturally defined units or zero points. 
For example, temperature is commonly measured in both Celsius and Fahrenheit scales, which are 
related by an affine transformation. 
We include the Scale Invariance and Monotonicity axioms only to facilitate discussion and comparison 

with axiomatizations of attribution methods in the cost sharing literature. They will play no role in the 

main results. 

1.3. Candidate attribution methods. In this section we describe some attribution methods motivated 
by the cost sharing literature, and mention the axioms they satisfy. 

1.3.1. Path methods. We first consider a natural class of attribution methods, the path attribution methods, 
that are well-studied in the cost sharing context (see [31 [TT]). These methods assign to each variable its 
marginal effect along some path from the initial point to the final point. They are analogous to the approach 
based on partial derivatives outlined in Subsection |1.1[ but they salvage completeness by integrating the 
partial derivatives along a path instead of taking a naive estimate at a single endpoint. Note that their 
definition is motivated by Theorem 1 1 . 7| from the cost sharing literature, which we discuss in Subsection|1.5| 



Definition 1.2. For each r, s S M", let ^r,s '■ [0, 1] M" be a C^-function with 7r,s(0) = r and 7r,s(l) = s, 
which we interpret as a path from r to s. Write ^r,s — (7r,s,i, • ■ • ,7r.s,n), and let 7r,s,i be non-decreasing if 
fi < Si and non-increasing if > s^. Then, the attribution method given by 

(1-1) Z,ir,s)^ f dJ{jrAt)hr.sAt)dt 

Jo 

is the single-path attribution method corresponding to the family of paths jr,s- If the method 

Zi(r, s) — Cjzf{r, s) for Cj > and Cj = 1 

j 3 

is a convex combination of single-path attribution methods z^ , we say that z is a path attribution method. 
For a single-path attribution method, we may check by the gradient theorem that 

zi(r,s) + ... + z„(r,s)= / ^a,/(7.,.(t))7;,.,.Wrfi= / V •/ = /(.)- /(r), 

meaning that completeness is satisfied for each single-path attribution method. Completeness is preserved 
under convex combinations and therefore holds for all path attribution methods. Further, path attribution 
methods satisfy Dummy, Dummy', Additivity, and Conditional Nonnegativity for all characteristic func- 
tions. That Dummy and Dummy' hold is obvious, Additivity holds because partial differentiation is linear, 
and Conditional Nonnegativity holds because a characteristic function non-decreasing in a variable has 
a non-negative partial derivative with respect to that variable. 

In the cost sharing context. Theorem 1 1 . 7| implies that these are essentially the only methods that satisfy 
Additivity and Dummy for all characteristic functions. We suspect that an analogue of this result also holds 
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in the attribution context, which guides our intuition. While none of our formal results rely on the notion of 
path methods, they provide a convenient way to think about attribution methods and are a useful starting 
point in our investigation of desirable attribution methods. We will now identify some specific candidate 
path attribution methods, which will use the following general construction. 

Definition 1.3. Fix a path 7 : [0,1] [0,1]", non-decreasing in each variable, such that 7(0) = and 
7(1) = (1, . . . , 1). Write 7 = (71, . . . , 7„). Then, the single-path attribution method corresponding to 



lr,s{t) = r + (^{si - ri)7i(t), . . . , (s„ - r„)7„(t)^ 



is the ajfine single-path attribution method corresponding to 7. An ajfine path attribution method is a convex 
combination of afhne single-path attribution methods. 

1.3.2. Methods based on the Shapley value. Recall that the Shapley value (Shapley |23j ) is a solution concept 
in cooperative game theory used to distribute the total surplus generated by a coalition of players among 
the players. We now mention two attribution methods that are adaptations of this discrete solution concept; 
both methods have been well-studied in the context of continuous demand cost sharing. We start with the 
method that is arguably the best known method in the cost sharing literature. 

Definition 1.4. The Aumann- Shapley method ^ is the affine single-path attribution method corresponding 
to the path 7i(t) = t. 

This was identified by Aumann and Shapley @] as a 'value' for non-atomic games. Next, we define a 
different and arguably more direct generalization of the Shapley value. 

Definition 1.5. The Shapley- Shubik method [9l El] is defined as follows. For any a € Sn, let 7'^ be the 
path 

{0 tn < a{i) - 1 

{tn~(j{i)) a{i) - 1 < tn < a{i) 
1 tn > 

where 7'^ walks along edges of the hypercube [0, 1]" in an order determined by a. Then, the Shapley-Shubik 
method is given by the average of the n\ path attribution methods corresponding to 7°^. 

More generally, a random order method [3T] is any convex combination of the afhne path attribution 
methods corresponding to the 7°^. A value-variant random order method is a path attribution method such 
that every path in the corresponding families of paths takes the form 7^^. 

Remark. Value-variant random order methods refine the notion of random order method in the following 
sense. If we fix the initial and final values r and s, the attributions Zi{r, s) of a value- variant random order 



method are a convex combination of the values given by applying ( 1.1 1 for the paths 7^^. Such a method is 



a random order method if the weights of this convex combination do not depend on r and s. 

The Aumann-Shapley method and value-variant random order methods (hence random order methods 
and the Shapley-Shubik method) satisfy Additivity, Dummy, and Dummy' because they are path attribution 
methods. The Aumann-Shapley method and random order methods (and hence the Shapley-Shubik method) 
satisfy Affine Scale Invariance by Lemma |D.1| because they are affine path attribution methods. The 
Aumann-Shapley and Shapley-Shubik methods additionally satisfy Anonymity. 

Remark. We may relate the Shapley-Shubik method to the Aumann-Shapley method as follows. The 
Shapley-Shubik attribution for a change from r to s is the expected attribution of a monotone random walk 
along the edges of the hypercube with opposite vertices at r and s. If we subdivide the hypercube with 
opposite vertices at r and s into a grid of smaller hypercubes and consider monotonic random walks in this 
structure, the density of the resulting walks will be focused on the diagonal. Hence, when the characteristic 
function satisfies some basic regularity conditions, the average of the path attribution methods corresponding 
to these walks will tend to the Aumann-Shapley method in the limit. 
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1.4. Statement of results. When choosing an attribution method, it is very desirable to have a uniqueness 
result, one which says that there is exactly one method satisfying some axioms because such a result identifies 
a method for use. If an attribution method is the unique method satisfying some axioms on a class of 
characteristic functions, we term these axioms an axiomatization for the attribution method. We seek 
axiomatizations for a specific class of characteristic functions, namely those that are a sum of a additively 
separable function and a multilinear function, defined as follows. 

Definition 1.6. A function / : M" — > M is additively separable if there exist : M — M with 

/(ri,...,r„) = /i(ri)H h/„(r„). 

A function / : M" — >■ M is multilinear if we may write / in the form 

/(ri,...,r„) = ^ c/]Jrj, 

I(Z[n] iel 

that is, as the sum of monomials of degree at most 1 in each variable. 

We justify our focus on a narrow class of characteristic functions in Section [l.5.3[ and we demonstrate that 
such characteristic functions have several practical applications in Section [2] We defer these considerations 
for now to state our results. 

Our main result is that there is a unique attribution method that satisfies Dummy, Additivity, Anonymity, 
Conditional Nonnegativity and Affine Scale Invariance for all characteristic functions that are the 



sum of a multilinear function and an additive function (Theorem 3.4). Interestingly, Theorem 4.1 shows that 



the Aumann-Shapley (Definition 1.4 1 and Shapley-Shubik (Definition 1.5) methods, both of which satisfy the 
axioms mentioned above, coincide for these characteristic functions. We therefore term this the Aumann- 
Shapley- Shubik method. We give an efficient algorithm to compute it in Theorem |4.5| and Corollary |4.6| 
As an intermediate step toward proving Theorem |3.4[ we show that the only methods that satisfy 



Additivity, Dummy' and Conditional Nonnegativity for all multilinear characteristic functions are value- 



variant random order methods (Theorem 3.2 1. Surprisingly, the proof implies that every path method (a 
continuous concept) is equivalent to some value-variant random order method (a combinatorial concept) for 
a multilinear characteristic function. 

To complete our results, we show in Theorem |4 . 4| that for every characteristic function outside this class, 
no analog of Theorem |3.4| is possible. That is, we show that the Aumann-Shapley and the Shapley-Shubik 
methods coincide if and only if the characteristic function is the sum of a multilinear and an additively 
separable characteristic function. This shows that our restriction to this class of characteristic functions is 
not simply a technical convenience and provides in Corollary |4 . 3| an axiomatization of the Aumann- Shapley- 
Shubik method. Section [1.5.31 discusses further implications of this result. 

1.5. Attribution versus cost sharing. In this subsection, we discuss the relationship between our attri- 
bution problem and the classical cost sharing problem. 

1.5.1. Cost sharing as attribution. Cost sharing models come in various flavors depending on whether the 
demands are binary, integral, or real-valued and whether the cost function is homogeneous or not (see 
Moulin [16] for a classification). In this sense our model resembles the rr, heterogeneous cost sharing 
model (both the characteristic function and the independent variables are real-valued, and the characteristic 
function is not homogeneous in the variables). More precisely, for a monotonically increasing cost function, 
rr heterogeneous cost sharing is equivalent to attribution from to the final demand. 

There are two immediate differences between attribution and cost sharing. First, in the attribution 
problem, variables change from one set of values to another, while in cost sharing there is just a single 
set of demands or contributions. Secondly, attribution relaxes the requirement that the cost function be 
monotone. Therefore, while negative cost shares do not make sense, negative attributions can make sense in 
some contexts. 

Remark. A naive approach to attribution might be to determine the attributions Zi(r, s, /) as the difference 
of the cost sharing problems Zi(0, s,/) — Zi{0,r,f). Given a valid cost sharing method, this defines a valid 
attribution method. However, this approach would not refiect the behavior of the characteristic function 
/ between r and s, instead expressing the idea that the change in / is from r to and then from to s. 
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More formally, Dummy' and Af f ine Scale Invariance are not satisfied by this approach. This suggests that 
naively applying the cost sharing framework may not be appropriate in this case. 

1.5.2. Axiomatics for cost sharing. Here we discuss axiomatization results from the cost sharing literature, 
both as motivation for some of our assumptions and as context for our results. We begin with a result 
identifying the analogue of the path attribution methods in cost sharing as exactly those methods satisfying 
the most basic of the axioms we introduced in Subsection |L2J 



Theorem 1.7 (Theorem 1 of [10 ). Any cost sharing method satisfying Dummy and Additivity is a path 
cost sharing method. 

We now give two axiomatizations of the Aumann-Shapley and Shapley-Shubik methods in the cost sharing 
context. The Aumann-Shapley method was axiomatized by Billera and Heath and Mirman |15j in the 
following theorem. 

Theorem 1.8 ( [5l [9j [15] ) . The Aumann-Shapley method is the unique cost sharing method that satisfies 
Additivity, Dummy, Scale Invariance, and Average Cost for Homogeneous Goods, which states that, 
for cost functions that are a function of the sum of the demands, the cost shares should be proportional to 
the demands. 

For the Shapley-Shubik method, we have the following axiomatization given by Friedman and Moulin [9]. 

Theorem 1.9 (Theorem 1 of |^). Any cost sharing method satisfying Additivity, Dummy, Monotonicity, 
Scale Invariance, and Continuity at Zero (cost shares are continuous in each variable near 0) is a ran- 
dom order method. The Shapley-Shubik method is the unique cost sharing method that satisfies Anonymity 
in addition to Additivity, Dummy, Monotonicity, Scale Invariance, and Continuity at Zero. 

Remark. In the attribution context, the Aumann-Shapley and Shapley-Shubik methods satisfy the axioms 



of Theorems 1.8 and 1.9 but it is not clear if the uniqueness properties continue to hold. We suspect 
that these results should also carry over to the attribution framework (with very similar proofs) after some 
appropriate modification. 

1.5.3. Axiomatization for attribution versus axiomatization for cost sharing. Our approach to the axiomatic 
study of attribution methods differs from that taken in the cost sharing literature. A typical axiomatic 



result in the cost sharing literature (like Theorems 1.8 and 1.9) identifies a certain cost sharing method as 



the unique method that satisfies certain axioms for all cost functions (cf. [SI [23]). This does not preclude the 
existence of multiple methods that satisfy the same set of axioms for a certain subclass of cost functions. For 
instance, Redekop |20| notes that the Aumann-Shapley cost sharing method satisfies the axioms mentioned 



in the uniqueness result for the Shapley-Shubik method (Theorem 1.9 1 when the cost function has increasing 
marginal costs (i.e. when the cost function is convex) 

In our model, the characteristic function is known when the attribution method is selected, so general 
uniqueness results similar to Theorems |1.8| and |1.9| are not necessarily sufficient to guide the selection of an 



attribution method. For instance, for a specific convex characteristic function, there might be more than 



one attribution method which satisfies the axioms required in Theorems 1.8 and|1.9[ meaning that they are 



not enough to select a unique method; in fact, all the applications in Section have convex characteristic 



functions. We therefore seek and successfully identify (see Section 1.4) axiomatizations that quantify less 
universally over the space of characteristic functions. 

In addition, quantifying less universally over the space of characteristic functions allows us to be more 
parsimonious with axioms. In the case of multilinear functions, our main result allows us to characterize the 
Aumann-Shapley method without using Average Cost for Homogeneous Goods, a 'partial domain axiom,' 
which, as Friedman and Moulin [9^ argue, is not very natural because it applies only to part of the space of 
initial and final values. 

In |18| . Owen associates a multilinear function to any cooperative game so that applying the Aumann- 
Shapley method to this function yields the Shapley value of the game. A generalization of these techniques 
may be used to prove Corollary |3.3| for path methods. In this context, the full Corollary |3.3| may be viewed as a 

3it is easy to show directly that all path attribution methods satisfying Scale Invariance are monotone for convex functions, 
of which multilinear functions with positive coefficients are an instance. Redekop 20 notices this for the Aumann-Shapley 
attribution method. 
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generalization to the case of an arbitrary attribution method. Further, Theorem 4.4 should be of independent 
interest to the cost sharing community because it identifies conditions on the characteristic function under 
which two important cost sharing methods, the Aumann-Shapley and Shapley-Shubik methods, can coincide, 
giving a converse to the result of |18j under slightly stronger conditionsj^ 

1.6. Notations. We write [n] for the set {1,2,..., n}. For a set of variables ci, . . . , c„ and a subset / C [n], 
write c for the n-tuple (ci, . . . ,c„) and cj for the product JIig/ over the indices in /. For two sets of 
variables Ci, . . . , c„ and di, . . . , c?„, we write c < d (resp. c < d) if Ci < di (resp. < di) for all i. We write 
[c, d] for the closed box /i x • • • x /„ , where li is the closed interval bounded by Ci and di . We use to 
denote the vector (0, . . . , 0) containing all O's. The length of this vector will always be clear from context. 
For a function / : M" — > M and a multiset of indices a, we denote by daf the mixed partial derivative with 
respect to the indices in a. In all cases where this construction appears, we will assume that / is chosen so 
that Young's theorem on the equality of mixed partial derivatives holds. 

2. Applicability of our Model 

In this section we discuss practical applications of our model. For our model to be applicable, the 
characteristic function must be known, deterministic, and multilinear, and the values of the variables at the 
initial and final points must be known exactly. The examples in this section satisfy these properties, and are 
indicative of other settings in which our model is potentially applicable. 

We begin with a few examples motivated by the Internet. 

Example 2.1 (Pay-per-click advertising [28j). The characteristic function is the spend s of an advertiser, 
which can be expressed as the product s = c- p oi the number of clicks c that an advertiser's advertisement 
received and the average cost per click p. The final values of s, c, and p are be statistics from a certain 
week, and the initial values are be statistics from the preceding week. The problem then is to identify to 
what extent the advertiser's change in spend is due to a change in the number of clicks versus a change in 
the cost per click. 

A more granular spend model applicable in a specific form of pay-per-click advertising called sponsored 
search advertising is 

/spend = '7-&-^P»-CTR,-CPQ. 

i 

Here, q is the number of ad-views that the advertiser is eligible for, h is the probability that the ads have suf- 
ficient budget to show, pi is the probability that an ad appears in the i*^ auction position, and (CTRj, CPCi) 
are the click through rate and the cost per click for the i^^ auction positionj^ 

Example 2.2 (e-Commerce website analysis). Consider an online retailer's website. We can model the 
website as a directed acyclic graph with a single sink i, which is the page displayed on a successful transaction 
(see Archak et al. |3] and Immorlica et al. [13] for similar models). For every page, let Sj denote the number 
of times that a surfer starts on page j. For every hyperlink directed from page i to page j, let pij denote the 
probability on average that a surfer follows this link given that he or she is at page i. The expected number 
of successful transactions is 

E n P^^^ 

iGV P a path from i to t {r,s)GP 

which is multilinear. The initial values for the variables are average statistics for the last year, and the final 
values are the same statistics for this year. The attributions to the variables {sj} and {pij} may then yield 
insight into changes in traffic patterns that impact sales. 

In our model, we require the characteristic function to be known and deterministic. This is in contrast 
to the fields of Regression Analysis jl^. where the function and the inputs are statistical quantities and 
require model fitting and estimation, and structural equation modeling [6 , where additionally the variables 



may also require inference. Example 2.1 satisfies these conditions because the characteristic function models 



a software system whose working is known and deterministic; Example |2 . 2| satisfies these conditions because 



"^We note that our proof of Theorem 4.4 does require the attribution context, however, as it relies crucially on the fact that 



attributions exist between any two pairs of values. 

"^Recall that all major search engines place some ads based on the results of an auction. 
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the characteristic function models a flow through a known graph; and Example |1.1| from the introduction 
satisfies these conditions because it models a known supply chain. 

Further, in both examples, we explain the change in performance of a system that occurred in the past. 
Consequently, we have full-information about the initial and final values of the variables. Our model is not 
explicitly set up to be predictive about the future, though the insights gained could potentially be useful to 
guide future decisions. For instance, in Example |2.1[ advertisers who notice that a large negative impact is 
attributed to the budget variable b may choose to raise their budgets. 

Remark. One advantage of attribution is that it enables the comparison of changes in unlike quantities: 
For instance, in Example |2.1[ the advertiser can meaningfully compare the impact of a change in the cost per 
click to the impact of a change in the budget-related throttling rate (b) . Such comparisons can aid decision 
making. In this example, the advertiser can decide if it is more worthwhile to focus on changing budgets, or 
controlling the cost per click (by changing bids in the ad auction). 

While the previous examples were about explaining the change in the performance of a system, the 
next example, which is motivated by investment, involves comparing the performance of a system against a 
benchmark. 

Example 2.3. The performance of a portfolio can be expressed as the sum 



where is the return within an asset class i and Wi the amount invested within this asset class. Performance 
attribution |29j attempts to explain why the performance of a portfolio (the final variables) deviates from 
the performance of a benchmark portfolio (the initial variables). In particular, it asks whether the deviation 
in performance is due to the difference in the allocation of investments across asset classes (the attributions 
to the Wi's) or to the selection of assets within an asset class (the attributions to the rj's). 

The standard way of doing performance attribution involves considering an active allocation term i^j ■ 
{w1 — w\), a security selection term w] ■ {rf — rl), and a slack term {rf — rj) ■ [wf — w]) for each asset class; 
the latter term is necessary for completeness, but does not yield any insight. In contrast, our approach yields 
completeness automatically. 

Finally, here is an example from performance analysis of basketball statistics. 

Example 2.4. Suppose the coaching staff of a basketball team wants insight into the change in offensive 
performance of the team from last year (the initial version of the variables) to this year (the final version 
of the variables). Such studies are currently done in other frameworks as in |22j or |27j . Letting rii, rrii, ai, 
and Pi be the number of games per season, the number of minutes per game, the number of attempts per 
minute, and the field goal percentage of each player, the total number of points scored by the team is 

/points — / ^ ' ' ' -^qq ' 

i 

Using attributions for /points in combination with other information can help the coaches understand and 
refine the performance of the team. 

We now give two remarks illustrating some advantages of our attribution approach. 

Remark. A common way for humans to perform attribution relies on counterfactual intuition. For instance, 
when we assert that smoking causes cancer, there is a presumption that holding all other things constant, 
not smoking will reduce the chance of contracting cancer. Such counterfactual semantics have been used as 
the basis for logics of causation (see Chapter 7 from Pearl [19], for instance). 



Path methods (Definition 1.2) and hence all the methods we consider in this paper have a natural coun- 
terfactual interpretation. Every path method considers the counterfactual of moving between the initial and 
final values along the chosen family of paths. Breaking this down, we may consider a path as the limit 
of piecewise linear paths which change only one independent variable at once. From this viewpoint, the 
attribution to an independent variable is simply the cumulative change in the function due to this infinite 
number of infinitesimal counterf actuals. 
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Remark. Let us reiterate the benefit of a method satisfying Aff ine Scale Invariance in hght of the 
above examples. For many attribution problems, the units in which variables are measured are a matter of 



convention and are not canonical in any sense. For instance, in pay-per-click advertising (Example 2.1 1, the 
cost of advertising may be measured as the cost per thousand impressions or the cost per million impressions 
(see HSj), and, in basketball statistics, field goal accuracy is popularly expressed as a percentage between 
and 100 rather than an accuracy rate between and 1. 

In these examples, it critical that a different scaling of the units does not change the attribution. Specifying 
variables can be even more difficult than this, however. Consider a characteristic function which depends 
on a dimensionless physical quantity such as the Reynolds number of a chaotic fluid or the Prandtl number 
of a material. Such quantities lack natural units or even canonical reference points; for them, we would like 
the attribution to be invariant not only under rescaling of units but also changes of the zero points of these 
units. Such changes are exactly affine transformations, leading us to the Aff ine Scale Invar isLnce axiom. 

We conclude this section with a discussion of the importance of applying attribution techniques carefully 
to yield meaningful insights. 



Remark. We return to the context of Example 2.1 Besides sponsored search advertising, another common 
form of advertising is content advertising, that is, advertising on websites. Search ads commonly have a 
higher cost per click (CPC) than content ads because they are typically more contextual. 

Consider an advertiser who employs both forms of advertising, and who seeks to use attribution methods 
to analyze the impact of a change in the CPC on the amount of money it spends on advertising. Suppose 
that the situation of the advertiser is summarized by Table [l] Note that though the search CPC and the 
content CPC have both doubled, the overall CPC has actually fallen because of an increase in the proportion 
of clicks from content ads. 

There are two possible ways to perform attribution in this situation. One way is to first compute the 
change in the overall CPC and use this to perform attribution (using the Aumann-Shapley-Shubik method, 
for instance). Because the overall CPC fell, we would conclude that CPC's had a negative impact on 
spend. Alternatively, if we reasoned about the impact of a change in the CPC of search ads and content ads 
separately, and then aggregated the attributions, we would come up with the more meaningful conclusion that 
CPC's had a positive impact on the change in spend. Thus, aggregating the attributions is more meaningful 
than attributing with aggregates in this example. 





Search CPC ($) 


Search Clicks 


Content CPC ($) 


Content Clicks 


Overall CPC ($) 


Initial 


1 


100 


0.01 


100 


0.505 


Final 


2 


100 


0.02 


10000 


0.0396 



Table 1. An example with mix effects. 



3. Characterizations of attribution methods for multilinear functions 

In this section, we seek an axiomatization for the class of multilinear functions. We focus on the class 
of multilinear functions for two reasons. First, this class of functions has several applications as illustrated 
in the previous section. Second, as discussed in Subsection |1.5.3[ axiomatizations over a narrow family of 
functions can be more meaningful than axiomatizations that quantify widely over characteristic functions. 

We ignore additively separable functions for the rest of this section due to the following uniqueness result. 

Lemma 3.1. On additively separable functions, there is a unique attribution method that satisfies Additivity 
and Dummy. 

Proof. Write an additively separable function / in the form /(ri,...,r„) = /i(?'i) + ••• + fn{fn)- By 
Additivity, for all i we have 

Zi{r, s, /) = z,{r, s, /i) H h z^{r, s, /„)• 

Now, by Dummy, Zi(r, s, fj) = for j ^ i, so by completeness Zi(r, s, fi) = fi{si) — fi{ri), which implies that 
Zi{r, s, f) = fi{si) — fi{ri) is the unique attribution method on additively separable functions. □ 
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3.1. Methods that satisfy Additivity, Dummy', and Conditional Nonnegativity. In this section we 
characterize attribution methods that satisfy the basic axioms Additivity, Dummy', and Conditional 
Nonnegativity for muhilincar characteristic functions. 

Theorem 3.2. Any attribution method on muhihnear functions which satisfies Additivity, Dummy', and 
Conditional Nonnegativity is a value-variant random order method. 

Proof. Let V be the space of muhihnear functions on xi, . . . and note that the monomials xi give a 
basis of V. Fix some r, s, and let K be the set of indices k such that ^ Sk- It will suffice to show that the 
attributions given by Zi(r, s, — ) correspond to the attributions of some value- variant random order method 
on (r, s). 

Step 1: The space of attribution methods 

Fix an attribution method Zi satisfying Additivity, Dummy', and Conditional Nonnegativity, and 
define Zij := Zi(r,s,xi). By Lemma B.4 applied to Zi, Additivity and Conditional Nonnegativity 
together imply that Zi{r, s, — ) is linear and therefore uniquely determined by its values Zij on the monomials 
xj. By Dummy' and completeness, we see that such Zij satisfy 

(3.1) Zij if i ^ /, 

(3.2) Zij = for i ^ if, 

(3.3) Zij = rj^KZijnK, and 

(3-4) E 



Zij = sj - rj. 



ieinK 



Here, (3.1) and (3.2 1 follow immediately from Dummy', (3.3 1 follows by noting that 



Zij - ri^KZijnK = Zi{r, s, xi - rj^KXinx) = 



by Additivity and Dummy', and (3.4) follows from completeness. Values of Zij satisfying constraints (3.1) 



through (3.4 1 are completely determined by the values of Zij with i e / and I d K. In fact, setting k = \K\, 



E max{|/| - 1,0} = E f^' ) - 1) = 



they form an affine subspace A' of dimension 

' ^ - (2^= - 1) = fc2'=-i - 2'= 

ICK i=l ^ ' 'i=l ^ ' 

inside the space Z of all tuples {^i,/} with i <^ 1 ,1 d K . 

Considering Conditional Nonnegativity gives the additional constraint 

(3.5) oiZj.i has the same sign as (s,; — rj) if a/X/ is non-decreasing in 



Any {zi./} satisfying (|3.1|) through (3.5 1 gives rise to a valid set of attributions on (r, s). Hence, an attribution 



method satisfying Additivity, Dummy', and Conditional Nonnegativity is characterized on the pair of 
values (r, s) by the closed subspace A of A! defined by 

^ := < {zi j} for i e /, / C i^T I ^ Zij — si — rj and Zij satisfy ( 3.5 ) > . 
I iei J 

Step 2: The space of value-variant random order methods 

Let us now characterize the functionals Zi{r,s,—) given by value-variant random order methods. The 
space of attributions on (r, s) which can result from value-variant random order methods is specified by 
giving for each J C K and j G J a weight Cj^j so that 



(3.6) 

(3.7) 

(3.8) 
(3.9) 



,{0 



E] '^i^K = 1; and 

i 
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Note here that (3.8) is huphed by (3.6) and (3.7). The space of such methods is therefore the closed subset 
TZ defined by (3.9) lying within the affine space TZ' defined by (3.6) through (3.8) inside the space C of all 
tuples {cjj} with j £ J,J C K. Notice that TZ' has dimension at least 

k 



E 



i - (2*^ - 1) = fc2 



k-l 



1. 



Step 3: Mapping from value-variant random order methods to attribution methods 

We now understand the map (j) : TZ ^ A between value-variant random order methods and attribution 
methods; it will be induced by a linear map : C — Z. For I C K and i G I, setting z™ := zl°{r, s,xi) and 
/' = / — {i}, the map 4> is given explicitly by 

(3.10) zlj ^^Cij(sinjri-j - S/nj-f^jJ-j-juij}) 



J3i 



= ^Cj,,7S/nJ-{i}^/-j(si ~ri) = ^ Ci^,pu{i}Si'nJ'ri'-j'{si - r^). 

J'CK-{i} 



.131 



Because value- variant random order methods are attribution methods satisfying Additivity, Dummy', and 
Conditional Nonnegativity, the resulting attributions satisfy the constraints (3.1) through (3.5). 

Step 4- Checking that is injective 

We now claim that is injective. By (3.10), the map cj> is given by a k2^^^ x k2*'^^ matrix $ such that 

• $ is block diagonal with 2^^^ x 2'^'^^ blocks, and 

• the i**^ block of $ is indexed by subsets K — {«} and has entries 

^/',J' = {si - ri)si'nJ'ri'-j', 

where /', J' C K - {i}. 
We must check that <&* is non-singular for each i. For this, we claim that 

det$'=[](s,-r,)2'=-\ 



Consider the matrices 

We will show by induction on k that 



A 



K 



[sinjri-j)i,jcK- 



det Ak = [[{sk- Tk) 

keK 

which obviously implies the desired. In the base case fc = 1, we see that 

1 1 



A 



K 



ri si 



and the conclusion is obvious. Now suppose the statement for some k and take some K with \K\ = fc + 1. 
Then, pick some j £ K and set K' — K — {j}. Then, placing Ak into block form, we have 



det{ Ak) = det 



Ak' Ak' 
rjAK' sjAk' 



det 



Ak' 
rjAK' {sj-rj)AK' 



detiAK'f - Ylis^-uf 



completing the induction. 

Step 5: Putting everything together 

We have now shown that is injective as a map C ^ Z. Therefore, (f> : TZ' ^ A' is an injective linear 
map between affine spaces with dim 7?.' > dimyl', hence an isomorphism of affine spaces. It remains only to 
show that this isomorphism restricts to 7?. ^; for this, we match the conditions (3.5) and (3.9) to check 
that maps TZ' ~TZ to A' — A. 

For any {cj^j} E TZ' — TZ, choose j and J with j E J C K such that Cjj < 0. For / C [n], define the 
points and by 

/ \ri i i I , J \si i ^ I 
u- = < and VI = < 

\siiEl r.j i e /. 
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By Lemma B.2 we may find a multilinear function h{xi, . . . ,Xj, . . . ,Xn) so that h{u\) — Sj j. Now, h is 
linear in each Xi, hence it is non- negative on [r, s] because it is non- negative on each of the vertices of [r, s]. 
Therefore, the multilinear function 

(/(x) — Xjhi^xi , . . . , Xj , . . . , Xji ) 

satisfies g{u^) — g{u'^~^^^) = Sj — Vj and g{u^) — g{u^~'^^^) for all I ^ J. Further, because h is non-negative 
on [r, s] , g is non-decreasing in Xj on [r, s] . On the other hand, we see that 

z;°(r,s,5) = Y,c,j{g{u') - g{u'-^^^)) = cMu'') - g{u'-^^^)) = c,j{s, - r,), 

13 j 

which has opposite sign from sj ~rj. This means that the image of {cj,,/} CzTZ' — TZ under (j) does not satisfy 
Conditional Nonnegativity, hence lies in A' — A. Therefore, we conclude that cj) maps TV — TZlo A' — A, 
hence maps TZ bijectively to A, as needed. □ 



Remark. In the proof of Theorem 3.2 Conditional Nonnegativity plays two different roles. First, it 
provides the technical condition that allows us to convert from Additivity to linearity by using Lemma 
|B.4| Secondly and more crucially, it is necessary because any value- variant random order method is a convex 
combination of the attributions along the paths rather than an affine combination. As a result, such a 
method satisfies Conditional Nonnegativity. 



Recall from Section 1.3.1 that path attribution methods satisfy Additivity, Dummy', and Conditional 



Nonnegativity for all characteristic functions. Therefore, for multilinear functions. Theorem 3.2 implies 
that all path attribution methods are value-variant random order methods. This is somewhat surprising 
because random order methods are inherently combinatorial and may be evaluated using the values of 
the characteristic function at a finite set of points, while path attribution methods require in general a 
continuous evaluation of the characteristic function. Thus we see that the form of the characteristic function 
in Theorem |3.2| is key in reducing the latter continuous evaluation to a discrete one. See Section [4] for an 
explicit illustration of this in the context of the Aumann-Shapley method. 

3.2. Methods that satisfy Affine Scale Invarieoice. The characterization in the previous section allows 
for significant freedom in the selection of an attribution method, arguably undesirably so. For instance, it 
is possible to vary the convex combination over the random order paths for each r, s in some discontinuous 
way. To address this issue, we impose in this subsection a continuity condition on our paths. Following 
our axiomatic approach, we would like to impose this continuity condition on paths via an axiom on our 
attribution methods. A natural candidate, then, is Affine Scale Invariance, as it is a continuity condition 
on attributions and has a very natural interpretation in the attribution context. With the addition of Affine 
Scale Invariance, we have the following. 

Corollary 3.3. Any attribution method on multilinear functions satisfying Additivity, Dummy, Conditional 
Nonnegativity, and Affine Scale Invariance is a random order method. 

Proof. First, we claim that Dummy and Affine Scale Invariance imply Dummy' for r, s such that ^ Sj 
for any i. Suppose that a characteristic function / does not depend on the value of on [r, s]. We may 
write 

/(ri, . . . ,r„) = firi, . . . , f^, . . . , r„) + f(ri, ...,h,-- ■,rn)r„ 

where /^(ri, . . . , f^, . . . , r„) = on [r,s], which is Zariski dense in M", hence = as a polynomial. This 
implies that f — f^, so the result holds by Dummy. 



Now, take r* = (0, . . . , 0) and s* = (1, . . . , 1). By Theorem 3.2 we see that 



for some random order method z* . Because Zi and z* both satisfy Affine Scale Invariance, this implies 
that Zi = z* is a random order method, as needed. □ 
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3.3. Main Result. The characterization in the previous section allows us to treat independent variables 
asymmetrically. For instance, we could consider only a single random order path in our convex combination. 
But there appears no a priori reason to treat variables asymmetrically, and so we impose Anonymity, which 
gives us the following axiomatization. 

Corollary 3.4. There is a unique attribution method on multilinear functions satisfying Additivity, Dummy, 
Conditional Nonnegativity, Aff ine Scale Invariance, and Anonymity. 



Proof. By Corollary |3.3[ such a method must be a random order method. But there is a unique random 
order method satisfying Anonymity, the Shapley-Shubik method, as needed. □ 

4. The Aumann-Shapley-Shubik method 



Recall from Section |1.3.2| that the Aumann-Shapley method satisfies all the axioms mentioned in The- 



orem |3.4| for every characteristic function, while Corollary |3.4| shows that there is a unique method that 
satisfies these axioms for multilinear functions. This implies that the Aumann-Shapley method coincides 
with the Shapley-Shubik method for multilinear functions. We note that a proof by direct computation is 
also possible; for completeness, we show this proof in Appendix [C| 

Theorem 4.1. If / is the sum of a multilinear function and an additively separable function, then the 



Aumann-Shapley (Definition 1.4 1 and Shapley-Shubik (Definition 1.5) attribution methods agree for / 



We illustrate the attributions that Aumann-Shapley and Shapley-Shubik yield on small instances of mul- 
tilinear functions in the following example. 

Example 4.2. For /(ri,r2) — rir2, these methods coincide and both methods give: 

zi{r,s,}) = (si - ri) ^ and Z2[r,s,f) = (S2 - r2) . 

In particular, when r = 0, both methods correspond to an equal split. For f{ri,r2,rz) — rir2r3, the 
attributions again agree and are 

zi{r,s,f) = (si -ri)' 

Z2(r, s, /) = (g2 - ?'2) '"' ' """""''g' ' and 

6 

We may now define the Aumann-Shapley-Shubik method for characteristic functions that are the sum of 
a multilinear and an additively separable function as the method equivalent to both the Aumann-Shapley 



2r2r3 - 


h 2S2S3 -i 


- r2S3 - 


H S2r3 




6 






2rir3 - 


h 2siS3 4- 


- riss - 


H Sirs 




6 






2rir2 - 


h 2siS2 -i 


- riS2 - 


^ sir2 



and Shapley-Shubik methods. Summarizing the conclusions of Theorem 4.1 and Corollary |3.4[ we obtain 



the following axiomatic characterization of the Aumann-Shapley-Shubik method. 

Corollary 4.3. For characteristic functions / which are the sum of a multilinear function and an addi- 
tively separable function, the Aumann-Shapley-Shubik method is the unique method satisfying Additivity, 
Dummy Conditional Nonnegativity, Anonymity and Aff ine Scale InvarisLnce. 



Remark. Corollary 3.4 and the fact that the Shapley-Shubik method satisfies Monotonicity together im- 
ply that the Aumann-Shapley-Shubik method satisfies Monotonicity. Further, Sprumont and Wang |25j 
show that the Shapley-Shubik method satisfies a property stronger than Aff ine Scale Invariance called 
Ordinal Invar icLnce, meaning that the Shapley-Shubik method is invariant under all order-preserving 



(monotone) reparameterizations of the variables. Corollary 3.4 implies that this carries over to the Aumann- 
Shapley-Shubik method. 

4.1. When do Aumann-Shapley and Shapley-Shubik agree? Having identified the Aumann-Shapley- 
Shubik method as a uniquely desirable one for characteristic functions which are the sum of a multilinear 
function and an additively separable function, we now consider when it exists. As we show in the following 
Theorem |4.4[ this will occur only if the characteristic function / takes this form. 

Theorem 4.4. If the Aumann-Shapley and Shapley-Shubik attribution methods agree for some cost function 
/, then / is the sum of a multilinear function and an additively separable function. 
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Proof. By Lemma B.l it suffices for us to show that d^jf = for distinct We first consider the case 
n = 2, in which case we wish to show that ^12/ is constant. Then, for any r = (ri, r2) and s — (si, S2) with 
r < s, the Aumann-Shapley attribution to the second variable is 



..AS 



d2f{l{t))i2{t)dt 



with 7(i) = (f — t)r + ts. On the other hand, the Shapley-Shubik attribution is 

Subdivide the rectangle R with vertices at (ri,r2), (?'i,S2), (si,?'2), and (si,S2) into the triangular regions 
Ti lying above the path of 7 and T2 lying below the path of 7 as shown in Figure 1(a) below. 




Figure 1. Steps in the proof of Theorem [44 
Then, by Stokes' Theorem, we have 



di2f{xi,X2)dxidX2 



Ti 



d2f{xi,X2)dx2 



dTi 



d2fh{t)W2{t)dt~[f{n,S2)-f{rur2)] 



and 



di2f{xi,X2)dxidx2 



T2 



d2f{xi,X2)dx2 = [/(S1,S2) - /(Sl,7'2)] 



dT2 



d2f{l{t))i2{t)dt. 



Because '^(r, s, /) — Z2^'{r, s, /) by assumption, subtracting the two previous equations and applying our 
previous computations gives that 



(4.1) 



dl2f{zi,Z2)dzidZ2 ^ / dl2.f{zi,Z2)dzidZ; 



Ti 



T2 



for any choice of r, s. In particular, applying (4.1 ) for the pairs (r, s), [r, ^ 
the result of the latter two from the first, we obtain 



and (^^,s) and subtracting 



(4.2) 



^2 + ^2 



<9l2/ = 



di2f 



for all r, s. The results of this process are shown in Figure 1(b) Now, for any x = (xi, X2), set x' = {xi, — X2). 
Applying (4.2 ) to the pairs (r, r + 2a;), (r + x' ,r + x' + 2x), . . . , (r + nx' , r + nx' + 2a;), we find that for any 
n we have 



(4.3) / 9i2/ 

[ri X [^2+2:2 ,^2+2x2] 

This process is shown in Figure [l(c) 



di2f. 



[ri + (n+l)xi ,ri+(n+2)2;i] X [r2 — nX2,r2 — {n~l)x2] 



Suppose now for the sake of contradiction that di2f were not constant. Then, there must exist some 
r < s such that di2f{r) 7^ 9i2/(s). Suppose without loss of generality that di2f{r) > di2f{s). Because 
di2f is continuous, we may find open neighborhoods J7 of r and F of s such that di2f{x) > di2f{y) for 
X & U,y & V. Now, choose x = {xi,X2) and n so that [ri,ri + xi] x [r2 + X2,r2 + 2x2] C U and that 



AXIOMATIC ATTRIBUTION FOR MULTILINEAR FUNCTIONS 



15 



[ri + (n + l)a;i,ri + {n + 2)xi] x [r2 — nx2,r2 — (n — 1)0:2] C V, in which case (4.3) provides a contradiction. 
Therefore, 812/ is constant, which completes the proof in the case n — 2. 

For the general case, choose any two variables r,; and rj. Restricting to attributions between points with 
all other variables held fixed, the n = 2 case tells us that dijf is independent of and rj, which means 
exactly that dujf = and dijjf = 0. This holds for all so / takes the desired form. □ 

One implication of Theorem |4.4| is that we will need a different axiomatization for characteristic functions 
that are not the sum of an additive and a multilinear function. Additionally, it justifies our restriction to 
sums of multilinear and additive characteristic functions. 



Remark. The proof of Theorem 4.4 relied heavily on Stokes' theorem. In fact, this approach works more 
generally to compare general path attribution methods; we summarize the idea briefly here. Consider a 
single-path attribution method corresponding to a family of paths 7^ 5. Letting I^^s be the (closed) image of 
jr,s in [r, s], we see that the attributions are given by 

(4.4) z,(r, s) = J^' dd{^rAt))ir,s,S)dt - dj{r)dn, 

where we view dif{r) dri as a differential form on Ir^s- From this perspective, it is clear that Zi{r, s) depends 
only on the underlying set Ir^s of the path and not on the choice of parametrization 7^,5- 

We can now use this viewpoint to compare methods. Consider the case n = 2. Let 7^^ and 7^^ be two 
families of paths and consider the corresponding single-path attribution methods. If these methods coincide 
for some characteristic function /, then for all r, s, we have for all i that 



Zi(r, s,/)= / difdri^ / djdn, 

where ^ and ^ are the images in [r, s] of 7^ j, and 7^ ^ , respectively. Suppose for simplicity that the closed 
curve formed by first traversing 7^ ^ and then traversing 7^ ^ is not self- intersecting. Then, it bounds an 



open set Ar,s in [r,s]. From (4.4 1, we then find that 



(4.5) = / d2fdr2 - d2fdr2 = / di2fdndr2, 



where the final equality follows from Stokes' Theorem. We have therefore translated condition (4.4 1 involving 



line integrals to condition (4.5) involving area integrals. In the situation of Theorem 4.4 this condition is 
(4.1), which we may analyze by elementary means because Ti and T2 are geometrically quite simple. The 
general case seems to require different techniques; some ongoing work in this direction by the authors uses 
an approach involving tools from wavelet theory. 

4.2. Computing Aumann-Shapley-Shubik. In this subsection, we discuss the efficient computatio n of 
the Aumann-Shapley-Shubik method for multilinear functions]^ As discussed at the end of Subsection 3.1 



if / is a multilinear function, then this method is computable in finite time because it coincides with the 
Shapley-Shubik method. Indeed, the attributions given by the Shapley-Shubik method are the average of the 
marginal impact of changing a variable over the finite number of possible variable orderings. However, there 
does not always exist an efficient (polynomial time) algorithm to compute the Shapley-Shubik attributions 
(see the hardness results in [51 fH]'). 

Now, for /(r) — ri r„, the most basic example of a multilinear function, the Aumann-Shapley-Shubik 

attributions Zi(r, s, /) are computable in finite time, as to compute the Shapley-Shubik attributions in this 
case it suffices to evaluate / a finite number of times. In principle, this may involve 0(2") evaluations. 



one for each of the vertices of [r, s]. However, Theorem 4.5 below implies that in this case we may compute 
attributions in time quadratic in the number of variables. If we instead consider general multilinear functions, 
iterating the algorithm of Theorem |4.5| in Corollary |4.6| yields runtime quadratic in the number of variables 
and linear in the number of non-zero monomials in the characteristic function. These two results together 
ensure that our attribution theory is not impractical for computational reasons. 



''We ignore additively separable functions because the attribution assigned to a variable is simply the change in the function 
in which it appears. 
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Theorem 4.5. Let /(r) = ri • • • r„. Then, for any r, s and each i, the Aumann-Shapley-Shubik attribution 



Zi{r, s, /) is computable in 0{tt?) time and Oin) memory. 



Proof. From the calculations in the computational proof of Theorem |4.1| given in AppendixjCj the attributions 
are given by 

Zi{r,sJ) = —^{s,-ri) ^ \K\\{n-l-\K\)\sKr[n\-{t-]-K 

K(Z[n\-{i] 

^ 71— 1 

k=a A'C[n]-{i} 

so it suffices to compute this value. The computation is invariant under relabeling of coordinates, so we may 
assume for convenience of notation that i = n. In this case, we have 

^ n— 1 

Zn{r,S,f) —{Sn-rn)^kl{n-l- ky. ^ SKrin-l]-K- 

k=0 ifC[n-l] 
\K\=k 



Our approach is to compute the sums 



Xk,m ■— ^ SKr[m]-K 



Kclm] 
\K\=k 



for m < n—1 and < A: < m using dynamic programming. Computing Zi(r, s, /) then requires only a simple 
summation. Algorithm [T] formalizes this idea. 



Algorithm 1 Computing the Aumann-Shapley-Shubik attribution z„(r, s, /). 
^0,0 ^ 1 

for m = 1 to n — 1 do 

^0 ,rn— 1 

for fc = 1 to m — 1 do 
end for 

Xm.'m *^ ' A'jt^— l,m— 1 

end for 

return i(s„ - r„) X)fe=o ^K"- - 1 - fc)! • Afe,„_i 



The correctness of Algorithm [T] follows from the evident recursion 

{Tm ■ ATo.m-l k = 

Sm ■ Xk-l^m-l + rm ' X^.m-l I < k < m - I 
Srn ' A^m— l,m— 1 k ~ m 

and the expression for Zi{r,s,f ) obtained at the beginning of the proof. There are 0{n^) iterations of the 
loop, each taking 0(1) time to update Xk,m, giving a total runtime of O(ri^). Further, at each step, only 
the values of Xk^m for < fc < m and Xk^m-i for < fc < m — 1 are required; storing only these yields a 
memory requirement of 0{n). □ 

Corollary 4.6. Let / be a multilinear characteristic function in n variables with N non-zero monomial 
terms. Then, the Aumann-Shapley-Shubik attribution Zi(r,s,f) is computable in 0{n? ■ N) time and 0{n) 
memory. 



Proof. By Additivity and Dummy, we may simply run the algorithm of Theorem 4.5 N times, once for each 
non-zero monomial in /, and sum the resulting contributions. This trivially gives the desired runtime and 
memory costs. □ 



AXIOMATIC ATTRIBUTION FOR MULTILINEAR FUNCTIONS 



17 



References 

[1] AczEL, J., AND Erdos, P. The non-existence of a Hamel-basis and the general sokition of Cauchy's functional equation 
for non-negative numbers. Publ. Math. Debrecen 12 (1965), 259—263. 

Apostol, T. M. Calculus, Vol. 2: Multi-Variable Calculus and Linear Algebra with Applications. Wiley, 1969. 
Archak, N., Mirrokni, V. S., and Muthukrishnan, S. Mining advertiser-specific user behavior using adfactors. In 
Proceedings of the 19"^ international conference on World wide web (New York, NY, USA, 2010), WWW '10, ACM, 
pp. 31-40. 

Aumann, R. J., AND Shapley, L. S. Values of non-atomic games, 1974. 

BiLLERA, L. J., AND Heath, D. C. Allocation of shared costs: A set of axioms yielding a unique procedure. Mathematics 
of Operations Research 7, 1 (1982), 32-39. 

BOLLEN, K. A. Structural Equations with Latent Variables, 1st ed. Wiley-Interscience, April 1989. 
Darboux, M. Sur le theoreme fondamental de la geometric projective. Math. Ann. 17 (1880), 33-42. 

Deng, X., and Papadimitriou, C. H. On the complexity of cooperative solution concepts. Math. Oper. Res. 19 (May 
1994), 257-266. 

Friedman, E., and Moulin, H. Three methods to share joint costs or surplus. Journal of Economic Theory 87, 2 (August 
1999), 275-312. 

Friedman, E. J. Paths and consistency in additive cost sharing. International Journal of Game Theory 32, 4 (August 
2004), 501-518. 

Haimanko, O. Partially symmetric values. Math. Oper. Res. 25 (November 2000), 573-590. 

Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics. Springer 
New York Inc., New York, NY, USA, 2001. 

Immorlica, N., Jain, K., and Mahdian, M. Game-theoretic aspects of designing hyperlink structures. In Internet and 
Network Economics (WINE) (Patras, Greece, December 2006), vol. 4286, Springer LNCS, pp. 150-161. 
Matsui, Y., AND Matsui, T. NP-completeness for calculating power indices of weighted majority games. Theor. Comput. 
Sci. 263 (July 2001), 306-310. 

MiRMAN, L. J., AND Tauman, Y. Demand compatible equitable cost sharing prices. Math. Oper. Res. 7, 1 (1982), 40-56. 
Moulin, H. Axiomatic cost and surplus sharing. In Handbook of Social Choice and Welfare, K. J. Arrow, A. K. Sen, and 
K. Suzumura, Eds., vol. 1 of Handbook of Social Choice and Welfare. Elsevier, April 2002, ch. 6, pp. 289-357. 
Moulin, H., and Sprumont, Y. Responsibility and cross-subsidization in cost sharing. Cahiers de recherche 19-2002, 
Centre interuniversitaire de recherche en economic quantitative, CIREQ, 2002. 
Owen, G. Multilinear extensions of games. Management Science 18 (January 1972), 64—79. 
Pearl, J. Causality: Models, Reasning and Inference. Cambridge University Press, 2000. 

Redekop, J. Increasing marginal cost and the monotonicity of Aumann-Shapley pricing, mimco. University of Waterloo, 
Ontario, Canada, 1996. 

ROTH, A. E. Probabilistic values for games. Cambridge University Press, 1988. 

Sampaio, J., Ibanez, S., Lorenzo, A., and Gomez, M. Discriminative game-related statistics between basketball starters 
and nonstarters when related to team quality and game outcome. Perceptual and Motor Skills 103 (October 2006), 486-494. 
Shapley, L. S. A value for n-person games. Contributions to the theory of games 2 (1953), 307-317. 

Shapley, L. S., and Shubik, M. A method for evaluating the distribution of power in a committee system. The American 
Political Science Review 48, 3 (1954), 787-792. 

Sprumont, Y., and Wang, Y. Ordinal additive cost-sharing methods must be random order values, mimeo, Universite de 
Montreal, 1996. 

Warner, F. W. Foundations of Differentiable Manifolds and Lie Groups. Springer, 1983. 
Wikipedia. Basketball statistics. |http : //en.wikipedia. org/wiki/Basketbal l_statistics[ 

Wikipedia. Pay-per-click advertising, http : //en. wikipedia. org/wiki/Pay_per_cll ck| 

Wikipedia. Performance attribution, http : //en. wikipedia. org/Miki/Performance_att ribution[ 

Appendix A. A review of Stokes' theorem 

In this appendix, we give a brief intuitive introduction to Stokes' theorem as it relates to our paper for 
the unfamihar reader. To minimize technical difficulties, we restrict ourselves to the case of dimension two, 
where Stoke's Theorem coincides with Green's Theorem, and suppress technical assumptions. First, we state 
a basic version of the theorem. 

Theorem A.l (Stokes' Theorem). Let A be the region enclosed by a smooth closed curve in the plane. Let 
/ be a differentiable function defined on an open neighborhood of A, and let dA be the (oriented) boundary 
of A. Then, we have 



(A.l) / fdx2 = f 

J OA J A 



difdxidx2- 

OA J A 



Let us explain intuitively the meaning of Theorem A.l It relates the path integral of the 1-dimensional 
differential form fdxi along the boundary dA of A to the double integral of its exterior derivative d{fdxi) = 
d2fdxidx2 on the interior of A. We may visualize this in Figure [2] below. 
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Figure 2. A region A and its boundary dA in Stokes' theorem. 



It may be instructive to consider an analogy between Stokes' theorem and the fundamental theorem of 
calculus (which is actually Stokes' theorem in dimension 1). For a differentiable function F, the fundamental 
theorem of calculus relates the integral of F'{x) along an interval to the difference in values of F on the 
endpoints of this interval. That is, it states that 



Stokes' theorem generalizes the fundamental theorem of calculus in the sense that it replaces the concept of 
an interval with a simple region, and the endpoints of the interval (which form its boundary) with the closed 
curve that forms the boundary of the region. Its proof is also ultimately an application of the fundamental 
theorem of calculus. We refer the interested reader to Chapter 11 of [3] or to [55] for more detailed expositions 
of Stokes' theorem, which also appears in various engineering applications such as electrostatics and fluid 
dynamics. 

In this paper, Stokes' theorem is particularly convenient because it allows manipulation of line integrals of 
1-dimensional differential forms. We see that the attributions given by path attribution methods take exactly 
this form for differential forms involving the characteristic function. Applying Stokes' theorem now yields 
conditions on the area integral of a mixed partial which we use as a starting point for further considerations. 



In this appendix we state and prove some technical results about multilinear functions which are used 
in our proofs. We begin with an alternate characterization of functions which are the sum of a multilinear 
function and an additively separable function. 

Lemma B.l. A function / : M" — > M is the sum of a multilinear function and an additively separable 
function if and only if dujf = for all i ^ j. 

Proof. It is obvious that the sum of a multilinear function and an additively separable function has this 
property, so it remains to show the converse. We proceed by induction on n, with the base case n = 1 trivial. 
Now, if n > 1, we may write duf = as a function of qi only, hence we see that 



where dih ^ dip — 0. It remains to show that h is multilinear and that p is the sum of a multilinear function 
and an additively separable function. Now, for any distinct i,j ^ 1, we have that 



so taking ri = shows that d^jp = 0. Hence p is the sum of a multilinear function and an additively 
separable function by the inductive hypothesis. Now, notice that for z 7^ 1, we have 




Appendix B. Technical results on multilinear functions 




and 




= diiif = diih, 



so h is multilinear. This completes the induction. 



□ 
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Remark. The condition in Lemma B.l is a mixture of the conditions for / to be multihnear (duf = 0) and 
to be additively separable {dij = 0). 

Now, fix r, s € M". Our next two resuhs provide alternate bases for the space of multilinear functions 
which will be convenient in analyzing their restriction to the vertices of [r, s] . For / C [n] , define the points 
and by 

/ in J 

Uj = < and = 

[Si i e I 

Notice that = wl"!"^ and that the vertices of [r, s] are exactly the points as / ranges over the subsets 
of [n]. We then have the following two characterizations of multilinear functions. 

Lemma B.2. For any r, s e W with ^ for all i, there exists for any I d [n] a multilinear function gj 
such that gi{u'^) = 5ij. 

Proof. Define gi by 

For any I J, there is some i G K such that — u/, meaning that gi{u'^) = for I ^ J. But gi{u^) = 1 
by definition, so this gj has the desired properties. □ 

Lemma B.3. For any r, s e M" and any Xj, there is a basis {fa} of the space of multilinear functions such 
that fa is non-decreasing in Xj on [r, s] . 

Proof. It suffices to consider the case where ^ Si for all i, as otherwise we may pick r', s' with [r, s] C [r', s'] 
and 7^ for all i. Further, we may assume that < Si for all i, as otherwise we may simply exchange 
and Si. Now, for J C [n] — {j}, consider the multilinear functions 

9ju{j} and gju{j} + 9 J 



given by Lemma B.2 It is clear that these form a basis for the space of all multilinear functions because 



{gi}ic[n] does. Further, notice that 

dj9ju{j} and 9j(.gju{j} + 9j) 

are multilinear functions in Xi, . . . ,Xj, . . . ,Xn which are non- negative on all vertices of [r, s], hence non- 
negative on [r, s]. Therefore, they give the desired basis for the space of multilinear functions consisting of 
functions non-decreasing in Xj on [r, s] . □ 



The existence of the basis of Lemma B.3 allows us to convert Additivity to linearity as follows. 



Lemma B.4. Fix r, s e M" and a variable xj, and let (p be an additive functional on the space of multilinear 
functions. If (/)(/) > for / non-decreasing in Xj on [r, s], then (/) is linear. 



Proof. Let {fa} be the basis given by Lemma B.3 By additivity, it suffices to check that is linear on 
span(/Q) for each a. But (t>{cfa) > for any c > 0, hence is additive and non-decreasing on span(/c). 
It is therefore linear on span(/Q,) as a monotone solution to the Cauchy functional equation (see ^ or the 
original paper of [7]). □ 

Appendix C. Proof of Theorem 14.11 



In this appendix, we give a computational proof of Theorem 4.1 which was omitted from the main text 
to streamline the exposition. First we need a technical lemma. 

Lemma C.l. For non-negative integers we have 

1 . . 1 

x'^(l — xydx= , ■ , dx. 
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Proof. We induct on i. For i = 0, the result is clear. Now, suppose that the result holds for some i — 1. In 
this case, integration by parts gives that 

-1 1 



— xy dx 



1 



^■ + 1 

7+1 7q 

i 



x\i-xy+^ 



"'O 



ix'~^^—(i-xy+^dx 
J + 1 



X- \i-xy+^dx 
1 



1 



which completes the proof. 



□ 



Proof of Theorem \4-l\ By Additivity, it is enough to consider /(r) — r.i-^ri^ ■ ■ - ri^, since Lemma 3.1 shows 
that the two methods agree for additively separable functions. Further, if /(r) does not depend on the value 

of ri, then the attribution to variable ri is by Dummy, so in fact it is enough to consider /(r) = ri r„. 

In this case, recall that the Aumann-Shapley method is the afBne path attribution method for ji{t) — t, 
so the attributions are given by 

zfHr,s)= t d.f{lr,s(t))l'Mdt 



{si - r^ 

° J<l[n\-{i} jeJ ]e[n\-.J-{i} JCKC[n]-{i} jeK 



JC[n]-{i}j6J je[n]-J-{i} 



jSJ je[n.]-J-{i} 



which is of the form 



for the constants 



,7C[n]-{i} j6J ieN-J-{i:} 



jGJ ie[«]-./-{i} 

On the other hand, each affine path attribution method for 7*^ assigns to variable i the attribution 

zl{r,s)^{si-r,) J]^ rj ]J Sj. 

cr(j")<CT(i) o-(j)>CT(i) 

Therefore, the attribution assigned to variable i under Shapley-Shubik is 



JGJ iG[n]-{i}-J 



f^=;!,(«»--^)E n n -.-^(^^--0 E ki!(--i-i^i)!n'V n 

JC[n]-{»} 

,, |J|!(n-l-|J|)! 



CTGS„ (T(j)<CT(i) (T(j)>a(i) 

so it suffices for us to show that 



which follows by taking i — \ J\ and j = n — 1 — | J| in Lemma C.l 



□ 
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Appendix D. Affine path attribution methods 

The Aumann-Shaploy and the; Sliaplcy-Shubik methods arc both affine path attribution methods. The 
following lemma demonstrates why they satisfy Affine Scale Invariance. 

Lemma D.l. Every affine path attribution method satisfies Affine Scale Invariance. 

Proof. Let z,; be the affine single-path attribution method corresponding to 7. For any c, d > 0, set 
5(ri,. .. ,r„) = /(ri, . . . , (r^ - d)/c, . . . , r„), r' = (ri,. . .,crj +rf,...,r„), and s' = {si,...,csj + d, . . . ,s„). 
Then, taking r,j (c) = c if i = j and Tij (c) = 1 otherwise, we have 



Zi(r',s',g 




Jo 




= Zt{r,s,f). 

The result follows because Affine Scale Invarieince is preserved under convex combinations. 



□ 
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